source("packageLoad.R")
packageLoad(c("dplyr",
"readr",
"sf",
"terra",
"tmap",
"ggplot2"))Lesson 2
Spatial Data Analysis
In Lesson 1 you were exposed to spatial data types and various databases you can pull spatial data from. Today we are going to use those data sets to perform a range of spatial analyses.
We briefly used the sf and terra packages yesterday, but today we will be exploring them much more in depth using a range of spatial operations they provide.
We have to start by reading in the packages we need for today. Some are repeats from Lesson 1, but we also have a couple new ones. Namely ggplot2 and tmap to do some quick visualizations of our spatial outputs. In Lesson 3 we will be using these packages for more advanced data visualizations.
Distance Calculations
We’re going to start off today with some distance calculations. Using our species occurrence data, say we want to know which species is often found closest to rivers and/or roads. We can answer this by finding each species average distance (across all occurrences) to our rivers and roads shapefiles.
First we have to read in the data. Our occurrences were saved as a csv file with lat/long. We can convert a non-spatial object (like a csv file) to a spatial sf object using the st_as_sf() function, specifying the CRS and lat/long columns.
occ <- read_csv("data/species_occ.csv") %>%
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)Throughout today we are going to be mapping our spatial data to quickly inspect it and get a visual of the data’s extent and characteristics.
tmap is a great R package for spatial data visualization. It allows for both static (“plot” mode) and interactive (“view” mode) mapping options, which you can set using the function tmap_mode() . For today we will be making quick interactive plots. Once you set the mode with tmap_mode(), every plot call to tmap after that produces a plot in that mode.
tmap_mode("view")Quick view of all our points, colored by species:
tm_shape(occ) +
tm_symbols(col = "Species", size = 0.5)Another way to make a quick map is with tmap’s qtm() function, which stands for “Quick Thematic Map”
qtm(occ, symbols.col = "Species")Now, for each species we want to find their average distance to rivers and roads. This involves point to line distance calculations, which we can perform with the sf package.
First let’s read in our rivers and roads shapefiles. You can read in shapefiles with the st_read() function from sf, specifying the ‘.shp’ extension, which will work as long as all other accompanying spatial files (.shx, .dpf, .prj) are within the same folder.
rivers <- st_read("data/rivers.shp")Reading layer `rivers' from data source
`C:\Users\ccmothes\Desktop\R-for-Geospatial\data\rivers.shp'
using driver `ESRI Shapefile'
Simple feature collection with 11066 features and 5 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -106.1837 ymin: 40.25778 xmax: -104.9431 ymax: 40.99842
Geodetic CRS: NAD83
roads <- st_read("data/roads.shp")Reading layer `roads' from data source
`C:\Users\ccmothes\Desktop\R-for-Geospatial\data\roads.shp'
using driver `ESRI Shapefile'
Simple feature collection with 17737 features and 4 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -106.1944 ymin: 40.25778 xmax: -104.9431 ymax: 40.99844
Geodetic CRS: NAD83
Before performing any spatial operations, all of our spatial objects must be in the same CRS. We can see our spatial objects’ CRS when we print the object to the console, or we can get the full CRS details with the st_crs() function.
st_crs(rivers)Coordinate Reference System:
User input: NAD83
wkt:
GEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]]
st_crs(roads)Coordinate Reference System:
User input: NAD83
wkt:
GEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]]
st_crs(occ)Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
So our line shapefiles are in NAD83 and the occurrences are in WGS84. It generally doesn’t matter which CRS you choose to use, but here we are going to transform our occurrences to NAD83 so we only have to transform one object instead of two. Here we use the st_transform() function, specifying we want the new CRS for our occurrences to be that of the rivers shapefile.
occ <- st_transform(occ, crs = st_crs(rivers))Also, our occurrence data set covers all of Colorado, but rivers and roads are only for Larimer County. We have to first filter our points to the extent of the rivers and roads objects. However, the extent of these is a square bounding box, not the exact boundary of Larimer County. We can subset Larimer County from our Colorado counties object, and use st_filter() to filter points the are found within the Larimer County polygon.
counties <- st_read("data/CO_counties.shp")Reading layer `CO_counties' from data source
`C:\Users\ccmothes\Desktop\R-for-Geospatial\data\CO_counties.shp'
using driver `ESRI Shapefile'
Simple feature collection with 64 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -109.0602 ymin: 36.99245 xmax: -102.0415 ymax: 41.00344
Geodetic CRS: NAD83
occ_larimer <- st_filter(occ, counties[counties$NAME == "Larimer",])
qtm(occ_larimer)Great, now we just have species occurrences within Larimer County.
Now for each point we want to calculate its distance to the nearest river and road. Let’s start with rivers and then do the same for roads. The most efficient way is to first find the nearest line feature for each point. We can do this with the st_nearest_feature() function.
This function returns the index values (row number) of the river feature in the rivers spatial data frame that is closest in distance to each point. Here we are saving these index values in a new column of our Larimer occurrences that we will use later to calculate distances.
occ_larimer$nearest_river <- st_nearest_feature(occ_larimer, rivers)Now, for each point we can use the st_distance() function to calculate the distance to the nearest river feature, using the index value in our new “nearest_river” column. Adding by_element = TRUE is necessary to tell the function to perform the distance calculations by element (row), which we will fill into a new column “river_dist_m”.
occ_larimer$river_dist_m <- st_distance(occ_larimer, rivers[occ_larimer$nearest_river,], by_element = TRUE)Notice that the new column is more than just a numeric class, but a “units” class, specifying that the values are in meters.
str(occ_larimer)sf [1,613 × 7] (S3: sf/tbl_df/tbl/data.frame)
$ Species : chr [1:1613] "Elk" "Elk" "Elk" "Elk" ...
$ year : num [1:1613] 2022 2022 2022 2022 2022 ...
$ month : num [1:1613] 1 1 1 1 2 3 3 3 3 3 ...
$ basisOfRecord: chr [1:1613] "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" "HUMAN_OBSERVATION" ...
$ geometry :sfc_POINT of length 1613; first list element: 'XY' num [1:2] -105.5 40.4
$ nearest_river: int [1:1613] 1487 6 1019 2850 8370 8741 1466 1431 1431 1449 ...
$ river_dist_m : Units: [m] num [1:1613] 105.1 532.9 306.6 68.3 43.3 ...
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
..- attr(*, "names")= chr [1:6] "Species" "year" "month" "basisOfRecord" ...
Cool, now we have the distance to the nearest river (in meters) for each individual species occurrence. Now say we want the average distance for each species. We can do some data wrangling to get these values using the dplyr() package. Using the pipe %>% operator again, we perform a chain of operations on the data frame. group_by() specifies that all following operations should be performed individually by a grouping variable, in this case we want to apply operations on each individual species. Next summarise() calculates a new column “river_dist” that is the average (per species) river distance, where we have to convert our ‘units’ column to numeric to perform the mean() function.
occ_larimer %>%
group_by(Species) %>%
summarise(river_dist = (mean(as.numeric(river_dist_m))))Let’s make a quick bar plot to visually compare. We can pipe this output into a ggplot() object, specifying which variables go on the x and y axes within aes() and then we want to fill the bar color by species. geom_col() returns a barplot where the heights of the bars represent values in the data (in our case species average distance to a river).
occ_larimer %>%
group_by(Species) %>%
summarise(river_dist = (mean(as.numeric(river_dist_m)))) %>%
ggplot(aes(Species, river_dist, fill = Species)) +
geom_col()
A thing to note about ggplot2 is that it uses + instead of %>% to add additional elements.
Let’s mess around with this plot a little bit further to make it look more aesthetic.
occ_larimer %>%
group_by(Species) %>%
summarise(river_dist = mean(as.numeric(river_dist_m))) %>%
ggplot(aes(Species, river_dist, fill = Species)) +
geom_col() +
labs(y = "Average distance to nearest river (m)") +
theme(legend.position = "none") #removes the legend
Now lets do the same thing, but calculate average distance to the nearest road.
occ_larimer$nearest_road <- st_nearest_feature(occ_larimer, roads)
occ_larimer$road_dist_m <- st_distance(occ_larimer, roads[occ_larimer$nearest_road,], by_element = TRUE)
occ_larimer %>%
group_by(Species) %>%
summarise(road_dist = mean(as.numeric(road_dist_m))) %>%
ggplot(aes(Species, road_dist, fill = Species)) +
geom_col() +
labs(y = "Average distance to nearest road (m)") +
theme(legend.position = "none")
Buffers
Alternatively, say you want to know what percentage of species’ occurrences (points) were found within a certain distance of a river or a road (calculated buffer).
To do this we could add a buffer around our line features and filter the points that fall within that buffer zone. For this example let’s say we are interested in the 100 m buffer zone around rivers and roads. However, if you try this you’ll notice this operation takes quite a while.
river_buffer <- st_buffer(rivers, dist = 100)Instead, a more efficient way would be to make a 100 m buffer around each point, and see how many intersect with a river or road.
occ_buffer <- st_buffer(occ_larimer, dist = 100)Still takes a little bit of run time, but much faster than buffering each line feature. Our occ_buffer object is now a spatial polygon data frame, where each feature is an occurrence buffer with 100 m radius.
Spatial Intersect
We can conduct spatial intersect operations using the function st_intersects(). This function checks if each individual buffer intersects with a river, and if so it returns an index value (row number) for each river feature it intersects. This function returns a list object for each buffer polygon, that will be empty if there are no intersections. We will add this as a column to our buffer data set, and then create a binary yes/no river intersection column based on those results.
river_intersections <- st_intersects(occ_buffer, rivers)If we inspect this object, we see it is a list of the same length as our occ_buffer object, where each list element is either empty (no intersections) or a list of index numbers for the river features that do intersect that buffer.
We want to create a new column that returns TRUE/FALSE if the buffer intersects with a river. We do this by testing if the length of each element is greater than 0 (if not the element is empty and returns FALSE since there are no river intersections).
occ_buffer$river_100m <- lengths(river_intersections) > 0Now we can find out what percentage of occurrences are within 100 m of a river for each species using similar dplyr operations as before.
occ_buffer %>%
st_drop_geometry() %>% #we use this function to treat the object as a dataframe
group_by(Species) %>%
summarise(total_occ = n(), percent_river = (sum(river_100m == TRUE)/total_occ)*100)Now lets do another type of spatial intersection. Say we want to know what percent of each county is defined as ‘urban area’, using our urban areas polygons.
Let’s read in our urban areas polygon shapefile.
urban <- st_read("data/urban_areas.shp")Reading layer `urban_areas' from data source
`C:\Users\ccmothes\Desktop\R-for-Geospatial\data\urban_areas.shp'
using driver `ESRI Shapefile'
Simple feature collection with 64 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -108.7507 ymin: 37.14073 xmax: -102.2416 ymax: 40.75509
Geodetic CRS: NAD83
Now since we are wanting to calculate the actual area of intersection (not just whether or not an intersection exists as before) we can use the st_intersection() function.
urban_intersect <- st_intersection(counties, urban)This function returns the urban areas polygons as new polygons that intersect within each county, tied to the county information it lies within. We see that there are more rows in this data set than the original counties data set, meaning some urban areas cross multiple counties.
To clean this data to get the results we’re interested in (percentage of each county that is covered by urban areas) we will first calculate the area of each urban area intersection, sum the total intersecting areas per county if there are multiple, and divide that sum by total county area. We are using a new function here from the dplyr package, mutate(), which creates new columns based on specified values/calculations.
intersect_area <- urban_intersect %>%
mutate(intersect_area = st_area(.)) %>% #create a new column with shape area
dplyr::select(NAME, intersect_area) %>% #reduce to just county name and intersect area columns
group_by(NAME) %>% # group by county
summarise(intersect_area = sum(intersect_area)) %>% #new column that sums intersect area per county
st_drop_geometry() #drop the geometry to treat as a dataframeWe return this as a data frame since we just want the intersection area values per county. We can then join this data frame to our counties shapefile to calculate percent urban area coverage. We are going to create a new counties object called counties_attr that we will be adding a lot of other county-level attributes to. We are using another new function here left_join(), which is a member of dplyr’s join functions where in this case the output contains all rows in x, the first element, and joins y, the second element, by a matching column name/variable, in this case county name.
counties_attr <- counties %>%
mutate(county_area = st_area(.)) %>% #calculate county area
left_join(intersect_area, by = "NAME") %>% #join county to intersect area data
mutate(urban_coverage = as.numeric(intersect_area / county_area)) # calculate a new column that is the proportion of urban area coverageNow let’s visualize urban coverage by county
qtm(counties_attr, fill = "urban_coverage")We can also look at total urban area instead of a percentage.
qtm(counties_attr, fill = "intersect_area")A little more variation here but maybe nothing too exciting if you are familiar with Colorado metro areas. But at least you learned some cool new tools related to spatial intersections!
Spatial/Nonspatial Joins
We already used this tool a little already, but we can use left_join() to add a bunch more attributes to our counties shapefile. First lets add our census data, which was collected at the county level and saved as a csv.
census <- read_csv("data/census_data.csv") %>%
dplyr::select(-NAME)We first want to remove the ‘NAME’ column, because it is slightly different than the ‘NAME’ column in our counties data and therefore will not join properly. We can instead join by matching ‘GEOID’, which is a unique numeric ID given to each county.
counties_attr <- counties_attr %>%
left_join(census, by = "GEOID")Lastly, lets join our species occurrence data set. Say we want to know how many species occurrences are found in each county. Here we go back to the st_intersects() function we used before, which returns a list of all spatial elements that intersect with the spatial features of interest, in this case how many occurrence points intersect with each county polygon. We then nest this within lengths() which will return the number of intersecting features (occurrences) for each county, and we put those values as a new column called “species_count”.
counties_attr$species_count <- lengths(st_intersects(counties_attr, occ))Now we have a bunch of information tied to our county shapefile, we can explore it spatially and comparatively.
qtm(counties_attr, fill = "species_count")Use ggplot2 to visualize the relationship between different county attributes.
counties_attr %>%
st_drop_geometry() %>%
ggplot(aes(total_pop, species_count)) +
geom_point() +
stat_smooth()
# urban area related to co_born
counties_attr %>%
st_drop_geometry() %>%
ggplot(aes(urban_coverage, species_count)) +
geom_point() +
stat_smooth()
Raster Reclassification
So far we’ve dealt with a bunch of vector data and associated analyses with the sf package. Now lets work through some raster data analysis using the terra package.
Lets read in our land cover and elevation raster files. These are files you downloaded yesterday, but were already processed. You can read more about these data sets and how they were processed here. Land cover data comes from the National Land Cover Database (NLCD) and elevation data comes from the AWS Open Data Terrain Tiles. You can read in a raster file with the rast() function from terra.
landcover <- terra::rast("data/NLCD_CO.tif")
elevation <- terra::rast("data/elevation_1km.tif")qtm(landcover)This land cover data set includes attributes (land cover classes) associated with raster values. We can quickly view the frequency of each land cover type with the freq() function.
freq(landcover)Let’s view this as a bar chart.
freq(landcover) %>%
ggplot(aes(reorder(value, count), count)) +
labs(x = "") +
geom_col() +
coord_flip() # switch the axes to better view land cover class names
Say we want to explore some habitat characteristics of our species of interest, and we are specifically interested in forest cover. Our first step is to create a new raster layer from our land cover layer representing percent forest cover. This will involve multiple operations, including raster reclassification and focal statistics. Specifically, say we want to calculate the average percentage of forest cover and urbanization within a 9x9 pixel moving window.
First lets reclassify our land cover raster, creating a new raster representing just forest/non-forest pixels.
Since rasters are technically matrices, we can index and change values using matrix operations. Given this particular raster has attributes (land cover class names) associated with values, we can index by those names.
#first assign landcover to a new object name so we can manipulate it while keeping the origian
forest <- landcover
#where the raster equals any of the forest categories, set that value to 1
forest[forest %in% c("Deciduous Forest", "Evergreen Forest", "Mixed Forest")] <- 1
#SPELLING IS IMPORTANT
#now set all non forest pixels to NA
forest[forest != 1] <- NALets look at our new forest layer
plot(forest)
Focal Statistics
Now we are going to perform focal statistics, which is a spatial operation that calculates new values for each cell based on a specified moving window. For this example we are going to calculate within a 9x9km moving window (since our pixel resolution is 1km). We supply this to the w = argument as a matrix, where the first value is the weight of each pixel, and the second two are the number of rows and columns. Second we use the “sum” function, since each forest pixel has a value of 1 we will get the total number of forest pixels within the moving window, and then later divide the values by the total number of pixels in the window (81) to get the percentage. The final raster values will represent for each pixel the surrounding forest percentage (within ~4.5 km radius).
Note: We are just making up these distance numbers to demonstrate the use of terra functions. In reality when working on your own research questions you should spend time thinking about the most appropriate values to use for your study species/system.
forest_pct <- terra::focal(forest, w=matrix(1,9,9), fun = "sum", na.rm = TRUE)forest_pct <- forest_pct/81plot(forest_pct)
Next, we wanted to know the percent forest cover associated with each species occurrence. Since we are now working with multiple spatial objects, we have to first check they are all in the same CRS and if not transform the data before any spatial operations.
crs(forest_pct)[1] "PROJCRS[\"Albers Conical Equal Area\",\n BASEGEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]],\n CONVERSION[\"Albers Equal Area\",\n METHOD[\"Albers Equal Area\",\n ID[\"EPSG\",9822]],\n PARAMETER[\"Latitude of false origin\",23,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8821]],\n PARAMETER[\"Longitude of false origin\",-96,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8822]],\n PARAMETER[\"Latitude of 1st standard parallel\",29.5,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8823]],\n PARAMETER[\"Latitude of 2nd standard parallel\",45.5,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8824]],\n PARAMETER[\"Easting at false origin\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8826]],\n PARAMETER[\"Northing at false origin\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8827]]],\n CS[Cartesian,2],\n AXIS[\"easting\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]],\n AXIS[\"northing\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1,\n ID[\"EPSG\",9001]]]]"
st_crs(occ)Coordinate Reference System:
User input: NAD83
wkt:
GEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]]
Looks like the raster layer is in a different CRS. Let’s reproject this so we can use it with our vector data (which are all in NAD83). We can project raster data to a new CRS with the project() function from terra.
One thing to note is that while terra does work with vector data, it wants them to be in a special format called a SpatVector , instead of an sf object. Luckily they have made it quick and easy to convert between the data formats using the vect() function, so we just need to remember to nest our sf objects within that function when using them in terra functions.
forest_pct <- project(forest_pct, vect(occ))Raster Extract
Now we can use the extract() function to extract the raster pixel value at each occurrence.
terra::extract(forest_pct, vect(occ))Notice that this returns a 2 column data frame, with an ID for each feature (occurrence) and the extracted raster value in the second column. We want to add these raster values as a new column to our occurrence data set, so we need to index just this second column of the extract() output.
occ$forest_pct <- terra::extract(forest_pct, vect(occ))[,2]Now let’s do the same extract method with our elevation raster, pulling the elevation value at each species occurrence.
Again, we need to first project the raster to the CRS of the occurrence data set.
elevation <- terra::project(elevation, vect(occ))Now we can do the same extraction as with the forest raster, putting the values in a new ‘elevation’ column
occ$elevation <- terra::extract(elevation, vect(occ))[,2]Let’s do some data frame operations to calculate and compare average forest cover across species
occ %>%
group_by(Species) %>%
summarise(avg_forest_pct = mean(forest_pct, na.rm = TRUE))occ %>%
group_by(Species) %>%
summarise(avg_forest_pct = mean(forest_pct, na.rm = TRUE)) %>%
ggplot(aes(Species, avg_forest_pct, fill = Species))+
geom_col() +
theme(legend.position = "bottom")
Looks like elk are associated with the most forested habitats. We can also view the spread of values with a box plot.
ggplot(occ, aes(Species, forest_pct)) +
geom_boxplot()
How about elevation?
ggplot(occ, aes(Species, elevation)) +
geom_boxplot()
Yellow-bellied marmots are found at the highest elevations, on average.
That’s one way to use the extract() function. We can also extract raster values within polygons, and supply a function to summarize those raster values.
Say we wanted to know the most common land cover type in each county. We can use extract() with the function ‘modal’ to return the most common type for each polygon, and we will add this as a new column.
Again we need to project our land cover raster.
landcover_prj <- project(landcover, vect(counties))terra::extract(landcover_prj, vect(counties), fun = "modal")This works similar to what we ran with the occurrence dataset…however we notice that it returns the raw raster values instead of the named land cover classes (which we want). This raster behaves such that values are named factors (as long as we have the land cover attribute file in the same folder), and we can pull this metadata with the cats() function.
cats(landcover)[[1]]This returns a single element list that is a data frame with a bunch of information. We just want to keep ‘value’ and ’NLCD Land Cover Class”, and remove all the empty classes. We have to first index the first element of the list to operate on just the data frame, then we can apply some dplyr functions.
nlcd_classes <- cats(landcover)[[1]] %>%
dplyr::select("value", nlcd_class = "NLCD Land Cover Class") %>%
filter(nlcd_class != "")Cool, now we can tie this to our counties data frame once we have the most common land cover value calculated with extract
counties$common_landcover <- terra::extract(landcover_prj, vect(counties), fun = "modal")[,2]This join is different from our others because the variable we want to join by has a different column name in each data set (even though the values match), but we can specify this within the by = argument like this:
counties <- counties %>%
left_join(nlcd_classes, by = c("common_landcover" = "value"))We can now get some summary statistics on the most common land cover types per county.
counties %>%
st_drop_geometry() %>% #for ggplot, don't need geometry
group_by(nlcd_class) %>%
summarise(n = n()) %>%
ggplot(aes(nlcd_class, n, fill = nlcd_class)) +
geom_col()+
theme(legend.position = "none")+
coord_flip()
And map it out by county:
tm_shape(counties)+
tm_polygons("nlcd_class")Save Data
We’ve added new columns to some of our data frames and created new ones. Let’s save these objects so we can use them in Lesson 3 without having to re-run all this code.
Yesterday we learned how to save shapefiles with st_write(). Another way to save data in R is by saving them as R objects, which is beneficial as it reduces file size. You can save individual R objects as .RDS files, or multiple R objects as .RData files. Since we have multiple objects we want to save and load back into our environment tomorrow, we will write them to a single .RData file.
Before writing to file, let’s get rid of some of the extra columns we don’t need. counties_attr is one of the main objects we will be using for visualizations tomorrow. Here we can de-select a range of columns by putting a - in front.
counties_attr <- counties_attr %>%
dplyr::select(-c(NAMELSAD:INTPTLON))Now we use the save() function, listing all the objects in our current environment that we want to save, and the file name and location with the .RData extension.
save(counties_attr, occ_buffer, occ_larimer, occ, file = "data/objects.RData")We will start Lesson 3 by reading these objects back into our environment (so you can close out of this R Studio session at the end of the day if you want).